Data Preprocessing

Rounding of some of the columns to make easier and larger buckets for better analysis

Replace Yes and No with 1 and 0 for attrition variable

Getting stats of the features. Droppping features with 0 variance, features which are irrelevant like Employee Number and Dropping Categorical Columns which already have Numerical Ordered Values which can be used like Educ_bucket

Correlation Matrix : Could have removed highly correlated variables if model performance had to be improved more

Getting target variable in Y and rest of the features in X

Using OneHotEncoding for non ordered Categorical Features, OrdinalEncoding for ordered Categorical Features and StandardScaler to scale all numerical features

Modelling

Splitting into Train and Test Sets

Running Logistic Regression Model for benchmark

Running Random Forest Model with best Hyperparameters

Using Grid Search CV for Hyperparameter Tuning and Stratified Cross Validation for better sampling

Checking for Recall for the model as we are most concerned with not classifying churners as non churners

Creating dictionary of features and their importance to find out top 3 contributing features

Data Analysis

Grouping by 'Attrition' column and seeing how behavior changes with different values for different features